Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to—or knowledge of—an underlying, unobservable state space. Our metric, the λ-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD(λ) with a different value of λ. Since TD(λ=0) makes an implicit Markov assumption and TD(λ=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the λ-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the λ-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different λ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.more » « lessFree, publicly-accessible full text available December 1, 2025
-
Abstract Provenance records from sediments deposited offshore of the West Antarctic Ice Sheet (WAIS) can help identify past major ice retreat, thus constraining ice‐sheet models projecting future sea‐level rise. Interpretations from such records are, however, hampered by the ice obscuring Antarctica's geology. Here, we explore central West Antarctica's subglacial geology using basal debris from within the Byrd ice core, drilled to the bed in 1968. Sand grain microtextures and a high kaolinite content (∼38–42%) reveal the debris consists predominantly of eroded sedimentary detritus, likely deposited initially in a warm, pre‐Oligocene, subaerial environment. Detrital hornblende40Ar/39Ar ages suggest proximal late Cenozoic subglacial volcanism. The debris has a distinct provenance signature, with: common Permian‐Early Jurassic mineral grains; absent early Ross Orogeny grains; a high kaolinite content; and high143Nd/144Nd and low87Sr/86Sr ratios. Detecting this “fingerprint” in Antarctic sedimentary records could imply major WAIS retreat, revealing the WAIS's sensitivity to future warming.more » « less
-
Principled decision-making in continuous state-action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning. We provide a theoretical analysis of our new smoothness definition, and discuss its implications and impact on control and exploration in continuous domains.more » « less
-
Volkert, Michael R. (Ed.)A protein roadblock forms when a protein binds DNA and hinders translocation of other DNA binding proteins. These roadblocks can have significant effects on gene expression and regulation as well as DNA binding. Experimental methods for studying the effects of such roadblocks often target endogenous sites or introduce non-variable specific sites into DNAs to create binding sites for artificially introduced protein roadblocks. In this work, we describe a method to create programmable roadblocks using dCas9, a cleavage deficient mutant of the CRISPR effector nuclease Cas9. The programmability allows us to custom design target sites in a synthetic gene intended for in vitro studies. These target sites can be coded with multivalency—in our case, internal restriction sites which can be used in validation studies to verify complete binding of the roadblock. We provide full protocols and sequences and demonstrate how to use the internal restriction sites to verify complete binding of the roadblock. We also provide example results of the effect of DNA roadblocks on the translocation of the restriction endonuclease NdeI, which searches for its cognate site using one dimensional diffusion along DNA.more » « less
-
Optimistic initialization underpins many theoretically sound exploration schemes in tabular domains; however, in the deep function approximation setting, optimism can quickly disappear if initialized naively. We propose a framework for more effectively incorporating optimistic initialization into reinforcement learning for continuous control. Our approach uses metric information about the state-action space to estimate which transitions are still unexplored, and explicitly maintains the initial Q-value optimism for the corresponding state-action pairs. We also develop methods for efficiently approximating these training objectives, and for incorporating domain knowledge into the optimistic envelope to improve sample efficiency. We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques.more » « less
-
A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features—often matching or exceeding the performance achieved with hand-designed compact state information.more » « less
-
Abstract Permafrost underlies approximately one quarter of Northern Hemisphere terrestrial surfaces and contains 25–50% of the global soil carbon (C) pool. Permafrost soils and the C stocks within are vulnerable to ongoing and future projected climate warming. The biogeography of microbial communities inhabiting permafrost has not been examined beyond a small number of sites focused on local-scale variation. Permafrost is different from other soils. Perennially frozen conditions in permafrost dictate that microbial communities do not turn over quickly, thus possibly providing strong linkages to past environments. Thus, the factors structuring the composition and function of microbial communities may differ from patterns observed in other terrestrial environments. Here, we analyzed 133 permafrost metagenomes from North America, Europe, and Asia. Permafrost biodiversity and taxonomic distribution varied in relation to pH, latitude and soil depth. The distribution of genes differed by latitude, soil depth, age, and pH. Genes that were the most highly variable across all sites were associated with energy metabolism and C-assimilation. Specifically, methanogenesis, fermentation, nitrate reduction, and replenishment of citric acid cycle intermediates. This suggests that adaptations to energy acquisition and substrate availability are among some of the strongest selective pressures shaping permafrost microbial communities. The spatial variation in metabolic potential has primed communities for specific biogeochemical processes as soils thaw due to climate change, which could cause regional- to global- scale variation in C and nitrogen processing and greenhouse gas emissions.more » « less
-
Abstract Lakes set in arctic permafrost landscapes can be susceptible to rapid drainage and downstream flood generation. Of many thousands of lakes in northern Alaska, hundreds have been identified as having high drainage potential directly to river systems and 18 such drainage events have been documented since 1955. In 2018 we began monitoring a large lake with high drainage potential as part of a long‐term hydrological observation network designed to evaluate impacts of land use and climate change. In early June 2022, surface water was observed flowing over a 30‐m wide bluff, with active headward erosion of ice‐rich permafrost soils apparent by late June. This overflow point breached rapidly in early July, draining almost the entire lake within 12 h and generating a 191 m3/s flood to a downstream creek. Water level and turbidity sensors and time‐lapse cameras captured this rapid lake‐drainage event at high resolution. A wind‐driven surface seiche and warming waters following ice‐out helped trigger the initial thermomechanical breach. We estimate at least 600 MT of lake sediment was eroded, mobilized, and transported downstream. A flood wave peaking at 42 m3/s arrived 14 h after the initial breach at a river gauge 9‐km downstream. Comparing this event with three other quantified arctic lake‐drainage floods suggests that lake surface area coupled with drainage gradient height can predict outburst flood magnitude. Using this relationship we estimated future flood hazards from the 146 lakes in the Arctic Coastal Plain of northern Alaska (ACP) with high drainage potential, of which 20% are expected to generate outburst floods exceeding 100 m3/s to downstream rivers. This fortunate and detailed drainage‐event observation adds to a growing body of research on the impact of lakes on arctic hydrology, hazard forecasting in a region with an increasing human footprint, and broader processes of landscape evolution in arctic lowlands.more » « less
An official website of the United States government

Full Text Available